Graphs and Probz

R Markdown

library(ggplot2)
library(markdown)
library(rmarkdown)
library(tidyr)
library(tidyselect)
library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

✓ tibble  3.1.6     ✓ dplyr   1.0.7
✓ readr   2.1.2     ✓ stringr 1.4.0
✓ purrr   0.3.4     ✓ forcats 0.5.1

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()

library(readxl)
LungCapData <- read_excel("_data/LungCapData.xls")
View(LungCapData)

m_lung<-mean(LungCapData$LungCap)
sd_lung<-sd(LungCapData$LungCap)

hist(LungCapData$LungCap, prob= TRUE, xlim = c(0, 20))
curve(dnorm(x, m_lung, sd_lung), add= TRUE,lwd= 2,col= "blue")

Section 1 Question 1

Looks like a normal distribution

Question 2

grouped_gender<- LungCapData %>% group_by(Gender)
summarize(grouped_gender)

# A tibble: 2 × 1
  Gender
  <chr> 
1 female
2 male

qplot(data = grouped_gender, x = Gender, y = LungCap, geom = "boxplot")

Males have a higher mean than females.

Question 3

Smokers have a higher mean

grouped_smokers<- LungCapData %>% group_by(Smoke)
summarize(grouped_smokers)

# A tibble: 2 × 1
  Smoke
  <chr>
1 no   
2 yes

qplot(data = grouped_smokers, x = Smoke, y = LungCap, geom = "boxplot")

Question 4

Looks like the lung capacity is highest for children ages 0-13, specifically for males.

LungCapData$Agegroups<-cut(LungCapData$Age,breaks=c(-Inf, 13, 15, 17, 20), labels=c("0-13 years", "14-15 years", "16-17 years", "18+ years"))

ggplot(LungCapData, aes(x = LungCap, y = Agegroups, fill = Gender)) +
          geom_bar(stat = "identity") +
          coord_flip() +
          theme_classic()

Question 5

Doesnt look like its good being a smoker under the age of 18, or any age. Lung capacity is smaller for these groups

ggplot(LungCapData, aes(x = LungCap, y = Agegroups, fill = Smoke)) +
    geom_bar(stat = "identity") +
    coord_flip() +
    theme_classic()

Question 6

covar<-cov(LungCapData$LungCap, LungCapData$Age)
print(covar)

[1] 8.738289

corre<-cor(LungCapData$LungCap, LungCapData$Age, method = "pearson")
print(corre)

[1] 0.8196749

Section 2 Question 2

a<-128/810
a

[1] 0.1580247

b<-434/810
b

[1] 0.5358025

c<-160/810
c

[1] 0.1975309

d<-64/810
d

[1] 0.07901235

e<-24/810
e

[1] 0.02962963

ei<-((a*0)+(b*1)+(c*2)+(d*3)+(e*4))
ei

[1] 1.28642

varei<-((0-ei)^2+(1-ei)^2+(2-ei)^2+(3-ei)^2+(4-ei)^2)/5
varei

[1] 2.509197

sdei<-sqrt(varei)
sdei

[1] 1.584044